Bibliography

191

[164] Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In Pro-

ceedings of the International Conference on Learning Representations, pages 1–18,

2017.

[165] Ziyang Luo, Artur Kulmizev, and Xiaoxi Mao. Positional artefacts propagate through

masked language model embeddings. arXiv preprint arXiv:2011.04393, 2020.

[166] X. Ma, P. Zhang, S. Zhang, N. Duan, Y. Hou, D. Song, and M. Zhou. A tensorized

transformer for language modeling. In Advances in Neural Information Processing

Systems, 2019.

[167] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning

models resistant to adversarial attacks. In ICLR, 2017.

[168] Brais Martinez, Jing Yang, Adrian Bulat, and Georgios Tzimiropoulos.

Train-

ing binary neural networks with real-to-binary convolutions.

arXiv preprint

arXiv:2003.11535, 2020.

[169] Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei

Sun, and Jingdong Wang. Conditional detr for fast training convergence. In Proceed-

ings of the IEEE/CVF International Conference on Computer Vision, pages 3651–

3660, 2021.

[170] Xiangming Meng, Roman Bachmann, and Mohammad Emtiyaz Khan. Training bi-

nary neural networks using the bayesian learning rule. In International conference on

machine learning, pages 6852–6861. PMLR, 2020.

[171] D Messerschmitt. Quantizing for maximum output entropy (corresp.). IEEE Trans-

actions on Information Theory, 17(5):612–612, 1971.

[172] Paul Michel, Omer Levy, and Graham Neubig. Are sixteen heads really better than

one? Advances in neural information processing systems, 32, 2019.

[173] Luca Mocerino and Andrea Calimera. Tentaclenet: A pseudo-ensemble template for

accurate binary convolutional neural networks. In 2020 2nd IEEE International Con-

ference on Artificial Intelligence Circuits and Systems (AICAS), pages 261–265. IEEE,

2020.

[174] Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of bayesian

methods for seeking the extremum. Towards global optimization, 2(117-129):2, 1978.

[175] Todd K Moon. The expectation-maximization algorithm. IEEE Signal processing

magazine, 13(6):47–60, 1996.

[176] Jean-Jacques Moreau. Proximit´e et dualit´e dans un espace hilbertien. Bulletin de la

Soci´et´e math´ematique de France, 93:273–299, 1965.

[177] Matthias Mueller, Neil Smith, and Bernard Ghanem. A benchmark and simulator for

uav tracking. In Computer Vision–ECCV 2016: 14th European Conference, Amster-

dam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 445–461.

Springer, 2016.

[178] Prasanna Kumar Muthukumar and Alan W Black. A deep learning approach to data-

driven parameterizations for statistical parametric speech synthesis. arXiv preprint

arXiv:1409.8558, 2014.